fix(logs): add durable execution diagnostics foundation#3564
Conversation
PR SummaryMedium Risk Overview Refactors lifecycle signaling across Updates Written by Cursor Bugbot for commit 767775c. Configure here. |
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Greptile SummaryThis PR establishes a durable execution diagnostics foundation by persisting Key changes:
Notable behavior change: The response from Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant EC as executeWorkflowCore
participant LS as LoggingSession
participant DB as Database
participant EX as Executor
EC->>LS: safeStart()
EC->>EX: execute() with wrappedOnBlockStart/Complete
loop For each block
EX->>EC: wrappedOnBlockStart(blockId, ...)
EC->>LS: onBlockStart(blockId, startedAt)
LS->>DB: jsonb_set lastStartedBlock (monotonic)
DB-->>LS: ack (tracked in pendingProgressWrites)
LS-->>EC: resolved
EC-->>EX: void userCallback fired separately
EX->>EC: wrappedOnBlockComplete(blockId, output)
EC->>LS: onBlockComplete(blockId, output)
LS->>DB: jsonb_set lastCompletedBlock (monotonic)
LS->>DB: void flushAccumulatedCost (fire-and-forget)
LS-->>EC: resolved
EC-->>EX: void userCallback fired separately
end
EX-->>EC: ExecutionResult
EC->>EC: finalizeExecutionOutcome()
EC->>LS: safeComplete / safeCompleteWithCancellation / safeCompleteWithPause
LS->>LS: drainPendingProgressWrites()
LS->>DB: completeWorkflowExecution (finalizationPath, lastStarted/CompletedBlock, traceSpans)
DB-->>LS: ack
LS-->>EC: resolved
EC->>DB: clearExecutionCancellation
EC->>DB: updateWorkflowRunCounts
EC-->>Caller: ExecutionResult
Last reviewed commit: 9db5e87 |
|
@icecrasher321 I think this is an important one. One scenario I have experienced is stuck workflow. I couldn't find any logs, and it just kept in running. I am planning to introduce few more prs after this one too. |
Store last-started and last-completed block markers with finalization metadata so later read surfaces can explain how a run ended without reconstructing executor state.
Await only the persistence needed to keep diagnostics durable before terminal completion while keeping callback failures from changing execution behavior.
Keep successful fallback output and accumulated cost intact while tightening progress-write draining and deduplicating trace span counting for diagnostics helpers.
9db5e87 to
c6d9195
Compare
Add the missing AuthType export to the hybrid auth mock so the async execution route test exercises the 202 queueing path instead of crashing with a 500 in CI.
Allow same-millisecond marker writes to replace prior markers and drop the unused diagnostics read helper so this PR stays focused on persistence rather than unread foundation code.
Drop the unused helper so this PR only ships the persistence-side status types it actually uses.
Ensure empty-subflow and subflow-error lifecycle callbacks participate in progress-write draining before terminal finalization while still swallowing callback failures.
Cool yeah, I'll review these. Thanks. |
|
bugbot run |
icecrasher321
left a comment
There was a problem hiding this comment.
Tested various cases: API execs, HITL, Manual. No ordering issues and the pattern makes sense.
* fix(mothership): fix mothership file uploads (#3640) * Fix files * Fix * Fix * fix(workspace): prevent stale placeholder data from corrupting workflow registry on switch * feat(csp): allow chat UI to be embedded in iframes (#3643) * feat(csp): allow chat UI to be embedded in iframes Mirror the existing form embed CSP pattern for chat pages: add getChatEmbedCSPPolicy() with frame-ancestors *, configure /chat/:path* headers in next.config.ts without X-Frame-Options, and early-return in proxy.ts so chat routes skip the strict runtime CSP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(csp): extract shared getEmbedCSPPolicy helper Deduplicate getChatEmbedCSPPolicy and getFormEmbedCSPPolicy into a shared private helper to prevent future divergence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(logs): add durable execution diagnostics foundation (#3564) * fix(logs): persist execution diagnostics markers Store last-started and last-completed block markers with finalization metadata so later read surfaces can explain how a run ended without reconstructing executor state. * fix(executor): preserve durable diagnostics ordering Await only the persistence needed to keep diagnostics durable before terminal completion while keeping callback failures from changing execution behavior. * fix(logs): preserve fallback diagnostics semantics Keep successful fallback output and accumulated cost intact while tightening progress-write draining and deduplicating trace span counting for diagnostics helpers. * fix(api): restore async execute route test mock Add the missing AuthType export to the hybrid auth mock so the async execution route test exercises the 202 queueing path instead of crashing with a 500 in CI. * fix(executor): align async block error handling * fix(logs): tighten marker ordering scope Allow same-millisecond marker writes to replace prior markers and drop the unused diagnostics read helper so this PR stays focused on persistence rather than unread foundation code. * fix(logs): remove unused finalization type guard Drop the unused helper so this PR only ships the persistence-side status types it actually uses. * fix(executor): await subflow diagnostics callbacks Ensure empty-subflow and subflow-error lifecycle callbacks participate in progress-write draining before terminal finalization while still swallowing callback failures. --------- Co-authored-by: test <test@example.com> Co-authored-by: Vikhyath Mondreti <vikhyath@simstudio.ai> * feat(admin): add user search by email and ID, remove table border - Replace Load Users button with a live search input; query fires on any input - Email search uses listUsers with contains operator - User ID search (UUID format) uses admin.getUser directly for exact lookup - Remove outer border on user table that rendered white in dark mode - Reset pagination to page 0 on new search Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(admin): replace live search with explicit search button - Split searchInput (controlled input) from searchQuery (committed value) so the hook only fires on Search click or Enter, not every keystroke - Gate table render on searchQuery.length > 0 to prevent stale results showing after input is cleared Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Siddharth Ganesan <33737564+Sg312@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: PlaneInABottle <y.mirza.altay@gmail.com> Co-authored-by: test <test@example.com> Co-authored-by: Vikhyath Mondreti <vikhyath@simstudio.ai>
Summary
workflow_execution_logs, includinglastStartedBlock,lastCompletedBlock, trace metadata, andfinalizationPathWhy
Later jobs and log read-surface fixes depend on a trustworthy execution diagnostics foundation. This PR stores the minimum durable data needed to explain where a run got to and how it ended without pulling in broader API or jobs-surface changes.
Scope
Validation
bun --cwd apps/sim vitest run lib/logs/execution/diagnostics.test.ts lib/logs/execution/logger.test.ts lib/logs/execution/logging-session.test.ts lib/workflows/executor/execution-core.test.tsbunx @biomejs/biome check apps/sim/lib/workflows/executor/execution-core.ts apps/sim/lib/workflows/executor/execution-core.test.ts apps/sim/lib/logs/execution/logging-session.ts apps/sim/lib/logs/execution/logging-session.test.ts apps/sim/lib/logs/execution/logger.ts apps/sim/executor/orchestrators/loop.ts apps/sim/executor/orchestrators/parallel.ts apps/sim/executor/orchestrators/node.ts apps/sim/executor/utils/subflow-utils.ts apps/sim/executor/execution/block-executor.tsfinalizationPath,lastStartedBlock, andlastCompletedBlockFollow-ups